Adjuvant therapeutic strategy decision support for an elderly population with localized breast cancer: A monocentric cohort retrospective study

Guidelines for the management of elderly patients with early breast cancer are scarce. Additional adjuvant systemic treatment to surgery for early breast cancer in elderly populations is challenged by increasing comorbidities with age. In non-metastatic settings, treatment decisions are often made under considerable uncertainty; this commonly leads to undertreatment and, consequently, poorer outcomes. This study aimed to develop a decision support tool that can help to identify candidate adjuvant post-surgery treatment schemes for elderly breast cancer patients based on tumor and patient characteristics. Our approach was to generate predictions of patient outcomes for different courses of action; these predictions can, in turn, be used to inform clinical decisions for new patients. We used a cohort of elderly patients (≥ 70 years) who underwent surgery with curative intent for early breast cancer to train the models. We tested seven classification algorithms using 5-fold cross-validation, with 80% of the data being randomly selected for training and the remaining 20% for testing. We assessed model performance using accuracy, precision, recall, F1-score, and AUC score. We used an autoencoder to perform dimensionality reduction prior to classification. We observed consistently better performance using logistic regression and linear discriminant analysis models when compared to the other models we tested. Classification performance generally improved when an autoencoder was used, except for when we predicted the need for adjuvant treatment. We obtained overall best results using a logistic regression model without autoencoding to predict the need for adjuvant treatment (F1-score = 0.869).


Introduction
Age is an established risk factor for breast cancer. The age threshold that typically characterizes elderly patients in high income countries is 65 years. In the United States, the median age of diagnosis of breast cancer for women is 62 [1], and 30% of new breast cancer cases in 2020 were diagnosed in women aged 70 years or more [2]. In European Union countries , approximately 44% of breast cancer cases occur in women older than 65 years of age [3]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Guidelines for the management of elderly patients with early breast cancer are scarce, primarily due to the lack of evidence, including the lack of validation of online adjuvant therapy tools. As a result, in non-metastatic settings, treatment decisions are often made under considerable uncertainty; this commonly leads to undertreatment and, consequently, poorer outcomes [4,5].
Treatment plans for breast cancer vary depending on the type of breast cancer, its stage, as well as other factors such as patient preferences and overall health. In early breast cancer, current standard protocols typically consist of surgery accompanied by either radiation therapy, neoadjuvant or adjuvant systemic therapy, or a combination of these therapies. The choice of post-surgery treatment for elderly breast cancer patients is generally considered a difficult decision because these patients are often in a worse physiological state. Elderly patients are rarely included in randomized clinical trials and underrepresented in meta-analyses showing a benefit of adjuvant chemotherapy with regard to breast-cancer mortality and overall survival chemotherapy [6,7]. In the absence of clinical trial results, an alternative approach is to use artificial intelligence (AI) to assist with treatment decision making.
Some of the earliest AI applications to provide cancer treatment recommendations were knowledge-based systems [8,9]. More recently, a variety of machine-learning approaches have been proposed to assist clinicians and/or breast cancer patients [10][11][12][13][14][15]. However, recommendations from existing decision support tools are usually relevant for patients aged 18 to 65 years, which is the age range for which most of the advisory tools have been trained. Relatively few studies analyzed treatment outcomes for elderly breast cancer patients [16][17][18][19].
One of the major prognostic tools in current clinical use for breast cancer is PREDICT (https://predict.nhs.uk) [20]. In spite of its popularity, PREDICT has been shown to underperform in specific subgroups of patients, in particular older patients [21]. The recently developed Adjutorium (https://vanderschaar-lab.com/adjutorium/) is a breast cancer prognostication and treatment benefit prediction model that outperforms PREDICT [19]. Adjutorium used large-scale publicly available datasets from the United Kingdom and the United States consisting primarily of patients aged 30-65 years, along with a smaller subset of older patients (age > 65 at diagnosis). Due to limitations of the datasets, Adjutorium did not include important tumor information such as progesterone receptor (PR) status. In this study, we generate models that address both these limitations. For one, our cohorts are highly representative of elderly breast cancer patients because they include only patients aged 70 years or more. In addition, we make use of an extensive dataset that includes administrative, biological, treatment, primary tumor, and survival data.
Here we present a data-driven prediction tool that can provide recommendations for postsurgery treatment for elderly breast cancer patients. Using data from a cohort of elderly women (� 70 years) diagnosed with cancer who underwent surgery with curative intent for early breast cancer, we predict all-cause mortality at 5 years in four clinically relevant scenarios. Using our models, it is possible to compare expected outcomes (e.g., difference in patient survival) for different treatment options, and thus generate an integrated view of what will likely happen to the patient in different treatment scenarios. This information can help oncologists to identify candidate adjuvant treatment schemes for elderly breast cancer patients based on tumor and patient characteristics.

Recruitment
This retrospective study used individual pseudonymized data collected from all consecutive elderly women (� 70 years) diagnosed with cancer who underwent surgery with curative To apply, the study coordinator should adopt the following procedure: (1) Assess the procedure applicable to the research project; (2) Perform an assessment of impact regarding data protection, as needed; (3) Ensure security of the system is kept at state of the art; (4) Document the conformity of the treatment to the proceedings; (5) Respect the framework set by the internal research, the simplified procedure or the authorization throughout the duration of data processing; (6) Record each processing operation into the registry of processing activities. Requests for assistance with this procedure should be made to the Health Data Hub by contacting Ms. Valérie Edel (valerie. edel@health-data-hub.fr).

PLOS ONE
intent (lumpectomy or mastectomy +/-axillary lymph node dissection) for early breast cancer in the French comprehensive Léon Bérard Cancer Center, from January 1997 to December 2016. There were no restrictions considering breast cancer histological and molecular subtype, tumor size (from pT1 to pT4) or lymph node status (from pN0 to pN3). Patients were excluded in case of in situ carcinoma without infiltrative carcinoma and in case of distant metastasis at the time of breast surgery. Because we were interested in 5-year survival among elderly patients, in this study we only included patients who were followed for at least five years, and for whom information on vital status was available. A total of 976 patients met these inclusion criteria. We used software ConSore to build our database. ConSore is a data mining tool developed by UNICANCER, a French academic cancer research organization [22]. Natural Language Processing is used to select patient cohorts and extract data from electronic medical records (EMR), providing a homogenous collection of meaningful information. Information is extracted in a structured form according to research criteria. It should be noted that a second human check was nevertheless carried out on all EMR studied.
Data collected in our analysis included the following patient characteristics at early breast cancer diagnosis: age, Eastern Cooperative Oncology Group performance status, body mass index (BMI), comorbidities (diabetes, cardiac insufficiency, coronary insufficiency, chronic obstructive pulmonary disease (COPD) and cognitive disorders), hospitalization history in the previous year, polypharmacy (> or = 5 medications a day). The following biologic measures at diagnosis were collected: hemoglobin, lymphocytes count, albuminemia, creatinine clearance (Cockcroft-Gault). The following data about disease characteristics were extracted: histological subtype, hormone receptor status, Human Epidermal Growth Factor Receptor-2 (HER2) status, Scarff-Bloom and Richardson (SBR) grade, number of tumors, size of the biggest tumor, and lymph node involvement according to the TNM classification [23]. The expression of estrogen receptors (ER), progesterone receptors (PR), and HER2 status were issued from the histopathological results of pre-therapeutic biopsy. Hormone receptor negativity was defined if less than 10% of cells stained for estrogen and progesterone receptors. The expression of HER2 was considered negative in case of lower than 1+ immunohistochemistry staining. For tumors with a score of 2+, an additional in situ hybridization determined HER2 amplification or non-amplification [24]. Data about cancer treatments included: type of surgery, lymph node dissection, adjuvant radiotherapy, adjuvant chemotherapy, adjuvant HER2 targeted therapy, adjuvant endocrine therapy. Continuous variables were categorized based on expert opinion while categorical variables with more than two categories were dichotomized by creating a binary column for each category (S1 Table). The present analysis received approval from the French Data Protection Authority (Commission Nationale de l'Informatique et des Libertés, authorization no 9191415; October 10, 2019) and was built in compliance with French and European regulations.

Statistical analysis
Autoencoder. To adequately model a high-dimensional dataset, it is advantageous to perform dimensionality reduction prior to classification. Autoencoders are an efficient dimensionality reduction technique [25]. An autoencoder is a type of artificial neural network that learns a representation of the dataset while ignoring noise in the data; it can compress existing and missing information together, without the need for removing or imputing missing values. In this study, we used a classical autoencoder with one encoding function and one decoding function, and a binary cross entropy loss function.
To assess the difference in performance due to the use of an autoencoder, we performed two sets of analyses. In the first one, we performed dimensionality reduction of our data using an autoencoder and subsequently generated predictive models. In the other, the autoencoder was not used prior to data modeling.
Algorithms and performance measures. Predictive modeling is a branch of machine learning that uses data mining to predict results. In this study, we tested seven classification algorithms, and compared their performance in predicting discrete outcomes of interest. We used the Random Forest, Decision Tree, Naïve Bayesian, Linear Discriminant Analysis, Logistic Regression, Nearest Shrunken Centroids, and Neural Networks algorithms. We performed all analyses in Python programming language [Python Software Foundation, https://www. python.org] using sklearn [26] and imblearn [27] libraries with default parameter values.
We used 5-fold cross-validation, with 80% of the data being randomly selected for training and the remaining 20% for testing. We assessed model performance using accuracy, precision, recall, F1-score, and AUC score. We report averaged performance values over the five executions. Due to the imbalanced nature of our dataset, we selected our best-performing models based on the F1-score.
Predicted outcomes. Overall survival is widely accepted as the gold-standard primary endpoint. Because our goal was to develop a decision aid to assist clinicians in the choice of post-surgery treatment for elderly breast cancer patients, we focused on 5-year overall survival. Our approach was to generate outcome predictions for different courses of action using data from our cohort. The first outcome of interest was all-cause mortality at 5 years, where the objective was to predict whether a patient will die within five years from the date of surgery. The minority (positive) class consists of patients who have died within five years. The second outcome was the need for adjuvant treatment, where we consider that the choice of treatment was correct if the patient survived at least five years after surgery. In this case, we predicted whether a patient had any adjuvant treatment, given that they have survived. Hence, we are only interested in patients who have survived at least five years after surgery. The positive class consists of patients who did not undergo adjuvant treatment. The third predicted outcome was the need for adjuvant chemotherapy, where we assume that patients who underwent chemotherapy and survived at least five years after surgery were correctly treated, while the opposite holds for patients who had chemotherapy and did not survive at least five years. The positive class includes patients who underwent adjuvant chemotherapy. Finally, the fourth outcome was death after chemotherapy, where we aim to distinguish the patients who should not have undergone chemotherapy after surgery. The positive class is the group of patients who should not have undergone chemotherapy, i.e., patients who underwent chemotherapy after surgery and died within five years after surgery.

Results
Patient demographics and characteristics were evaluated on the date of breast cancer diagnosis ( Algorithm performance measures are shown in Tables 2-5. Across all cases, logistic regression and/or linear discriminant analysis were the best performing models. We verified that the use of autoencoding generally improved model performance, with the exception of when we predicted the need for adjuvant treatment. To further assess the impact of the autoencoding step in model performance, we analyzed the information loss associated with the use of an autoencoder. Autoencoders generate a representation of a given input dataset using fewer dimensions that the original dataset. The encoding process of an autoencoder may lead to information loss. Information loss is the increase in entropy by transforming a dataset and is calculated by comparing the entropy of the dataset before and after a transformation. In this study, we used an autoencoder to compress the input dataset before each predictive task. In other words, we generated a lower-dimensional dataset to predict death within five years, another lower-dimensional data to predict the need for adjuvant treatment, and so on. To quantify the information loss of the autoencoding step, we compared the entropy of the dataset before and after the transformation. We used the following equation to compute entropy, where p(x) denotes the probability of each possible outcome [28]: In Table 6, we report the average value and standard deviation of the information loss across 100 runs of the autoencoding step. In this analysis, a positive value indicates that the entropy of the compressed dataset is larger than that of the original dataset. Interestingly, we obtained overall best results using a logistic regression model without autoencoding to predict the need for adjuvant treatment (F1-score = 0.869).
To contribute with clinical insight, we performed a feature importance analysis to identify the most predictive features for each outcome of interest. Feature importance techniques assign a score to input features of a predictive model that indicates the relative importance of each feature when making a prediction. Inspecting the importance scores provides information about which features are the most and least important for the model when making a prediction.
Linear algorithms, such as linear discriminant analysis and logistic regression, fit a model where the predicted output is the weighted sum of the input values. These algorithms determine the set of coefficients to be used in the weighted sum in order to make a prediction. If we ensure that the input variables have the same scale or have been scaled prior to fitting the model, we can use the resulting coefficients directly as a type of feature importance score. In this work, we used library sklearn from Python to retrieve the property coeff_ that contains the coefficients for each input variable of the linear discriminant analysis and logistic regression models. We then ranked all coefficients in decreasing order and retained the five highest ranked coefficients for each model.
Decision tree algorithms and ensembles of decision trees, such as random forest, offer importance scores based on the reduction in the criterion used to select split points, like Gini or entropy. After fitting the models using library sklearn in Python, we retrieved property fea-ture_importances_ that contains the relative importance score for each input feature of the decision tree and random forest models. We then ranked all scores in decreasing order and retained the five highest ranked scores for each model.
Results from this analysis are shown in Tables 7-10. The entries (numbers 1 thru 5) indicate the rank of each feature for each algorithm. For example, in Table 7 feature 'Post-operative radiotherapy' was ranked as the second most predictive feature in the Random Forest model and fifth most predictive feature in the Decision Tree model. Overall, the most predictive features across two or more models for any predictive task were 'lymph node invasion (category 0-1)', 'adjuvant endocrine therapy', and 'post-chemotherapy radiotherapy'. Unsurprisingly, Table 6. Information loss due to autoencoding.

Excision limit 4
Biggest tumor size-category 50-1000 mm 4 SBR grade-category 3 4 Type of surgery-Lumpectomy 5 Site of radiotherapy-Left breast 5 Monoclonal antibody therapy-Adjuvant 5 https://doi.org/10.1371/journal.pone.0290566.t008 when predicting the need for adjuvant treatment, 'adjuvant endocrine therapy' was the most predictive feature across all four models. This result points to the importance of assessing the risk, or need, of adjuvant treatment in an elderly population. Classically, and as expected, the most predictive feature of 5-year overall survival is 'Lymph node invasion-category 0-1'. Similarly, 'Lymph node invasion-category 0-1' is the most predictive feature to predict the need for adjuvant chemotherapy. This implies that the extent of lymph node invasion can help to determine the likelihood of needing additional chemotherapy. In Table 10, 'Lymph node invasion-category 0-1' is again a highly predictive feature when predicting death after chemotherapy. These results highlight the importance of lymph node invasion status as a key factor in patient outcomes.

Discussion
Providing the most appropriate adjuvant treatment for elderly patients with early breast cancer represents a daily challenge for oncologists; this is partly due to the higher incidence of comorbid conditions in this frail population. Existing data provides limited strong evidence to support recommendations, and international guidelines do not provide tangible guidance for this group of patients. Our goal was to generate a decision aid for clinical decision-making on post-surgery treatment for elderly breast cancer patients. We identified four relevant scenarios that could assist clinicians in the choice of adjuvant therapy. First, we predicted whether a patient would die or not within five years after surgery. In this case, we assumed that we know which treatment a patient has received. Second, we predicted whether a patient would need any type of adjuvant treatment in order to survive at least five years. In a similar manner, we predicted whether a patient would need adjuvant chemotherapy in order to survive at least five years. We focused especially on chemotherapy because this decision is considered particularly complex and often feared by patients. Finally, we predicted whether the choice of chemotherapy was a good one, considering all patients who underwent this type of adjuvant therapy. In all our analyses, we assumed that a treatment was successful if the patient survived at least five years after surgery. When predicting death within five years, 'lymph node invasion (category 0-1)' was the most predictive feature across three of the four models. Features 'lymph node invasion (category 0-1)' and 'post-chemotherapy radiotherapy' were most predictive of the need for adjuvant chemotherapy. Finally, 'lymph node invasion (category 0-1)' was again the most predictive feature of death after chemotherapy for two of the four models. These results are particularly interesting for oncologists because the aforementioned features relate only to tumor characteristics. In fact, the patient's age or comorbidities were not ranked as highly significant prognostic factors in our models. This, in turn, suggests that the therapeutic management of elderly patients with localized breast cancer must be conducted similarly to what is done for other age groups. These findings are consistent with other retrospective studies, which report that the absence of adjuvant chemotherapy in this population may have an impact on the chances of overall survival [29][30][31][32]. We observed consistently better performance using logistic regression (LR) and linear discriminant analysis (LDA) models. LR outperformed all other models in two of the four predictive tasks. For the other two tasks, LR and LDA showed comparable predictive power. LR models suggest well-defined relationships that are typically highly interpretable. However, this interpretability may be hindered by the use of an autoencoder due to the trade-off between accuracy and interpretability. This occurs because autoencoding generates a compressed representation of the initial feature space, but it is usually infeasible to associate a clinical meaning with the compressed features. We observed that classification performance generally improved when autoencoder was used, except for when we predicted the need for adjuvant treatment. Interestingly, we obtained overall best results using LR without autoencoding to predict the need for adjuvant treatment (F1-score = 0.869).
We acknowledge limitations of our study, such as the imbalanced nature of our dataset, meaning that not all response classes included similar numbers of patients. Additional work could analyze the effect of class imbalance on model performance. Moreover, in this work we did not stratify patients based on, e.g., hormone receptor status or clinical staging prior to modeling. In the future, additional models could be generated for specific patient subgroups. Hence, external model validation could be performed for general and subgroup-specific models. We also acknowledge limitations in our dichotomous definition of outcome, where we consider that a choice is correct if the patient survives at least five years. Future work could consider a multi-factorial outcomes definition, by defining, e.g., a multi-objective function that incorporates different (possibly weighted) endpoints. Another limitation is that we did not predict outcomes such as recurrence. Finally, it would be relevant to evaluate our results in light of the oncogeriatric frailty scores. However, these geriatric evaluations are unfortunately still too infrequent in daily clinical practice to allow for such evaluation [33,34].
Supporting information S1